PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
نویسندگان
چکیده
Abstract Open-domain Question Answering models that directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) and QA-pair retrievers, show promise in terms of speed memory compared with conventional which retrieve read from text corpora. retrievers also offer interpretable answers, a high degree control, are trivial to update at test time new knowledge. However, these fall short the accuracy retrieve-and-read systems, substantially less knowledge is covered by available QA-pairs relative corpora like Wikipedia. To facilitate improved models, we introduce Probably Asked Questions (PAQ), very large resource 65M automatically generated QA-pairs. We retriever, RePAQ, complement PAQ. find PAQ preempts caches questions, enabling RePAQ match recent whilst being significantly faster. Using PAQ, train CBQA outperform comparable baselines 5%, but trail over 15%, indicating effectiveness explicit retrieval. can be configured for size (under 500MB) or (over 1K questions per second) while retaining accuracy. Lastly, demonstrate RePAQ’s strength selective QA, abstaining answering when it likely incorrect. This enables “back-off” more expensive state-of-the-art model, leading combined system both accurate 2x faster than model alone.
منابع مشابه
What You Can Do with Coordinated Samples
Sample coordination, where similar instances have similar samples, was proposed by statisticiansfour decades ago as a way to maximize overlap in repeated surveys. Coordinated sampling had beensince used for summarizing massive data sets.The usefulness of a sampling scheme hinges on the scope and accuracy within which queries posedover the original data can be answered from t...
متن کاملWhat speech synthesis can do for you (and what you can do for speech synthesis)
This is a companion paper to my keynote talk at ICPhS 2015. It provides a guide to help readers familiarise themselves with recent advances in speech synthesis, with an emphasis on approaches that might provide useful tools to investigate speech, particularly by constructing experimental stimuli for perceptual experiments.
متن کاملGuest Editorial: Research Data: It's What You Do With Them
These days it may be stating the obvious that the number of data resources, their complexity and diversity is growing rapidly due to the compound effects of increasing speed and resolution of digital instruments, due to pervasive data-collection automation and due to the growing power of computers. Just because we are becoming used to the accelerating growth of data resources, it does not mean ...
متن کاملJtag/ Boundary Scan – What Can It Do for You and What Do You Have to Do?
1 Testing in an Integrated Circuitry Since the existence of integrated circuitries, there has been the necessity to check their functions. In the case of digital circuitries, a test is quite simple: all possible test vectors are applied in succession, and then the circuitries’ reactions at the outputs (actual value) are compared to the expected patterns (nominal value). If there are no differen...
متن کاملWhat are Exercises and How Should You Do Them?
This is a comment a student wrote in fall, 1997 on the evaluations of the exercises that you will use for this class: "I have taken this class before at another college. I got F's my first two tests, so I decided to take it over here. I got an A+ on my first test. What helped me is the beginning of the packet at the IU Bookstore that explained how to answer your questions." This exercise covers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2021
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00415